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NONPARAMETRIC ESTIMATION OF AN ADDITIVE MODEL 
WITH A LINK FUNCTION 

By Joel L. Horowitz 1 and Enno Mammen 2 

Northwestern University and Universitdt Mannheim 

This paper describes an estimator of the additive components of 
a nonparametric additive model with a known link function. When 
the additive components are twice continuously differentiable, the 
estimator is asymptotically normally distributed with a rate of con- 
vergence in probability of n _2//5 . This is true regardless of the (finite) 
dimension of the explanatory variable. Thus, in contrast to the exist- 
ing asymptotically normal estimator, the new estimator has no curse 
of dimensionality. Moreover, the estimator has an oracle property. 
The asymptotic distribution of each additive component is the same 
as it would be if the other components were known with certainty. 

1. Introduction. This paper is concerned with nonparametric estimation 
of the functions mi, . . . , m^ in the model 

(1.1) Y = F[n + mxpT 1 ) + • • • + m d (X d )) + U, 

where X ] \j = 1, . . . ,d) is the jth component of the random vector Jel <i 
for some finite d>2, F is a known function, [i is an unknown constant, 
mi, . . . , ma are unknown functions and U is an unobserved random variable 
satisfying ~E(U\X = x) = for almost every x. Estimation is based on an 
i.i.d. random sample {Yi,Xi:i = 1, ...,n} of (Y, X). We describe an esti- 
mator of the additive components mi , . . . , that converges in probability 
pointwise at the rate n _2//5 when F and the mj's are twice continuously 
differentiable and the second derivative of F is sufficiently smooth. In con- 
trast to previous estimators, only two derivatives are needed regardless of 
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the dimension of X, so asymptotically there is no curse of dimensionality 
Moreover, the estimators derived here have an oracle property. Specifically, 
the centered, scaled estimator of each additive component is asymptotically 
normally distributed with the same mean and variance that it would have 
if the other components were known. 

Linton and Hardle (1996) (hereinafter LH) developed an estimator of 
the additive components of (1.1) that is based on marginal integration. The 
marginal integration method is discussed in more detail below. The estimator 
of LH converges at the rate n _2//5 and is asymptotically normally distributed, 
but it requires the rrij's to have an increasing number of derivatives as the 
dimension of X increases. Thus, it suffers from the curse of dimensionality. 
Our estimator avoids this problem. 

There is a large body of research on estimation of (1.1) when F is the 

identity function so that Y = y, + m^X 1 ) H h m d (X d ) + U. Stone (1985, 

1986) showed that ra~ 2 / 5 is the optimal L2 rate of convergence of an estima- 
tor of the mj's when they are twice continuously differentiable. Stone (1994) 
and Newey (1997) describe spline estimators whose L2 rate of convergence is 
n~ 2 / 5 , but the pointwise rates of convergence and asymptotic distributions of 
spline and other series estimators remain unknown. Breiman and Friedman 
(1985), Buja, Hastie and Tibshirani (1989), Hastie and Tibshirani (1990), 
Opsomer and Ruppert (1997), Mammen, Linton and Nielsen (1999) and 
Opsomer (2000) have investigated the properties of backfitting procedures. 
Mammen, Linton and Nielsen (1999) give conditions under which a smooth 
backfitting estimator of the rrij's converges at the pointwise rate n _2//5 when 
these functions are twice continuously differentiable. The estimator is asymp- 
totically normally distributed and avoids the curse of dimensionality, but 
extending it to models in which F is not the identity function appears to be 
quite difficult. Horowitz, Klemela and Mammen (2002) (hereinafter HKM) 
discuss optimality properties of a variety of estimators for nonparametric 
additive models without link functions. 

Tj0stheim and Auestad (1994), Linton and Nielsen (1995), Chen, Hardle, 
Linton and Severance-Lossin (1996) and Fan, Hardle and Mammen (1998) 
have investigated the properties of marginal integration estimators for the 
case in which F is the identity function. These estimators are based on the 
observation that when F is the identity function, then mi(x 1 ), say, is given 
up to an additive constant by 



(1.2) 




where w is a nonnegative function satisfying 
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Therefore, m\{x l ) can be estimated up to an additive constant by replacing 
E(y|A A = x) in (1.2) with a nonparametric estimator. Linton and Nielsen 

(1995) , Chen, Hardle, Linton and Severance-Lossin (1996) and Fan, Hardle 
and Mammen (1998) have given conditions under which a variety of esti- 
mators based on the marginal integration idea converge at rate ra _2//5 and 
are asymptotically normal. The latter two estimators have the oracle prop- 
erty. That is, the asymptotic distribution of the estimator of each additive 
component is the same as it would be if the other components were known. 
LH extend marginal integration to the case in which F is not the identity 
function. However, marginal integration estimators have a curse of dimen- 
sionality: the smoothness of the m^'s must increase as the dimension of X 
increases to achieve n -2 / 5 convergence. The reason for this is that estimating 
E(Y|X = x) requires carrying out a <i-dimensional nonparametric regression. 
If d is large and the raj's are only twice differentiable, then the bias of the 
resulting estimator of E(Y|X = x) converges to zero too slowly as n — > oo 
to estimate the m^'s with an n~ 2 / 5 rate. For example, the estimator of Fan, 
Hardle and Mammen (1998), which imposes the weakest smoothness condi- 
tions of any existing marginal integration estimator, requires more than two 
derivatives if d > 5. 

This paper describes a two-stage estimation procedure that does not re- 
quire a (i-dimensional nonparametric regression and, thereby, avoids the 
curse of dimensionality. In the first stage, nonlinear least squares is used to 
obtain a series approximation to each mj. The first-stage procedure imposes 
the additive structure of (1.1) and yields estimates of the mfs that have 
smaller asymptotic biases than do estimators based on marginal integration 
or other procedures that require <i-dimensional nonparametric estimation. 
The first-stage estimates are inputs to the second stage. The second-stage 
estimate of, say, mi is obtained by taking one Newton step from the first- 
stage estimate toward a local linear estimate. In large samples, the second- 
stage estimator has a structure similar to that of a local linear estimator, 
so deriving its pointwise rate of convergence and asymptotic distribution is 
relatively easy. The main results of this paper can also be obtained by using 
a local constant estimate in the second stage, and the results of Monte Carlo 
experiments described in Section 5 show that a local constant estimator has 
better finite-sample performance under some conditions. However, a local 
linear estimator has better boundary behavior and better ability to adapt 
to nonuniform designs, among other desirable properties [Fan and Gijbels 

(1996) ]. 

Our approach differs from typical two-stage estimation, which aims at 
estimating one unknown parameter or function [e.g., Fan and Chen (1999)]. 
In this setting, a consistent estimator is obtained in the first stage and is 
updated in the second, possibly by taking a Newton step toward the opti- 
mum of an appropriate objective function. In contrast, in our setting, there 
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are several unknown functions but we update the estimator of only one. It 
is essential that the first-stage estimators of the other functions have negli- 
gible bias. The variances of these estimators must also converge to zero but 
can have relatively slow rates. We show that asymptotically, the estimation 
error of the other functions does not appear in the updated estimator of the 
function of interest. 

HKM use a two-stage estimation approach that is similar to the one used 
here, but HKM do not consider models with link functions, and they use 
backfitting for the first-stage estimator. Derivation of the properties of a 
backfitting estimator for a model with a link function appears to be very 
complicated. We conjecture that a classical backfitting estimator would have 
the same asymptotic variance as the one in this paper but a different and, 
possibly, complicated bias. We also conjecture that a classical backfitting 
estimator would not have the oracle property. Nonetheless, we do not ar- 
gue here that our procedure outperforms classical backfitting, in the sense 
of minimizing an optimality criterion such as the asymptotic mean-square 
error. However, our procedure has the advantages of a complete asymptotic 
distribution theory and the oracle property. 

The remainder of this paper is organized as follows. Section 2 provides an 
informal description of the two-stage estimator. The main results are pre- 
sented in Section 3. Section 4 discusses the selection of bandwidths. Section 5 
presents the results of a small simulation study, and Section 6 presents con- 
cluding comments. The proofs of theorems are in Section 7. Throughout the 
paper, subscripts index observations and superscripts denote components of 
vectors. Thus, X{ is the ith observation of X, X 3 is the jth component of X, 
and X\ is the ith observation of the jth component. 

2. Informal description of the estimator. Assume that the support of X 
is X = [—1, l] d , and normalize mi, . . . , so that 



For any x € M d define m(x) = mi(x 1 ) + • • • + m ( i(x d ), where x- 7 is the jth 
component of x. Let {pk :k = 1,2,...} denote a basis for smooth functions 
on [—1,1]. A precise definition of "smooth" and conditions that the basis 
functions must satisfy are given in Section 3. These conditions include 
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and 

oo 

(2.3) rrij(x j ) = ^ BjkPkix 3 ), 

k=l 

for each j = 1, . . . ,d, each x 3 € [0, 1] and suitable coefficients {6jk}- For any 
positive integer k, define 

P K (x) = [l,pi{x x ), • • • ,p K {x l ), Vl {x 2 ), . . . ,p K {x 2 ), . . . , Pl (x d ), . . . ,p K {x d )}'. 

Then for 9 K Gl* 1 , P K (x)'9 K is a series approximation to fi + m(x). Sec- 
tion 3 gives conditions that k must satisfy. These require that k — > oo at an 
appropriate rate as n — > oo. 

To obtain the first-stage estimators of the rn^'s, let {Yj, Xi :i = 1, . . . ,n} 
be a random sample of (Y, X). Let 6 nK be a solution to 

n 

minimize: S nK {9) = n~ x T i Y i ~ F[P K {Xi)' 6}} 2 , 

i=i 

where K C M Kc(+1 is a compact parameter set. The series estimator of fi + 
m(x) is 

jl + m{x) = P K (x)'B nK , 

where fx is the first component of nK . The estimator of m,j{x 3 ) for any 
j = 1, . . . , d and any x J E [0, 1] is the product of . . . ,p K (x J )] with the 

appropriate components of 9 K . 

To obtain the second-stage estimator of (say) mi(x 1 ), let Xi denote the ith. 
observation of X = (X 2 , ...,X d ). Define m_i(Xi) = m 2 (X 2 ) + ■ ■ ■+rh d (Xf), 
where X\ is the ith observation of the jth component of X and rhj is the 
series estimator of mj. Let K be a probability density function on [—1,1], 
and define Kh{v) = K{v/h) for any real, positive constant h. Conditions 
that K and h must satisfy are given in Section 3. These include h — > at an 
appropriate rate as n — > oo. Define 

n 

S' njl (x\m) = -2^{K 4 -Ffr + nnix 1 ) +m_ 1 (X i )]} 
i=l 

x F'[/i + 7n 1 ( 2 ; 1 )+77l_ 1 (A > J )](X J 1 - x 1 )' K^x 1 - X l t ) 

for j = 0, 1 and 

n 

S'; n (x\m) = 2Y J F'[ji + m l {x 1 )+ m^X^iX} - x l y K h {x l - X}) 

i=l 

n 

- 2^2{Yi - F[fx + fmix 1 ) + m-i(Xi)]} 

x F"[fL + m^x 1 ) + m_ 1 (A > i )](X 4 1 - x l y K^x 1 - X}) 
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for j = 0, 1, 2. The second-stage estimator of mi(x l ) is 
(2.4) 

The second-stage estimators of rri2(x 2 ), . . . , rrid{x d ) are obtained similarly. 
Section 3.3 describes a weighted version of this estimator that minimizes the 
asymptotic variance of n 2//5 [mi(x 1 ) — m(x 1 )]. However, due to interactions 
between the weight function and the bias, the weighted estimator does not 
necessarily minimize the asymptotic mean-square error. 

The estimator (2.4) can be understood intuitively as follows. If /i and 
rh-i were the true values of fi and m_i, the local linear estimator of m^i 1 ) 
would minimize 

n 

S nl (x l MM) = J2i Y i ~ F \fi + fc o + h(Xl - x 1 ) 

(2-5) 

+ m. l (X l )]} 2 K h (x l -X}). 

Moreover, S'^^x 1 , m) = dS n \{x l , bo, b\)/dbj (j = 0, 1) evaluated at bo = rhi(x ) 
and b\ =0. S'^^x ,fh) gives the second derivatives of S n i(x , bo, b\) eval- 
uated at the same point. The estimator (2.4) is the result of taking one 
Newton step from the starting values bo = m\(x l ), b\ = toward the mini- 
mum of the right-hand side of (2.5). 

Section 3 gives conditions under which mi(x 1 ) — mi(x 1 ) = O p (n~ 2 ^) and 
n 2 / 5 [mi(x 1 ) — m\(x 1 )] is asymptotically normally distributed for any finite 
d when F and the m^-'s are twice continuously differentiable. 



3. Main results. This section has three parts. Section 3.1 states the as- 
sumptions that are used to prove the main results. Section 3.2 states the 
results. The main results are the n _2//5 -consistency and asymptotic normal- 
ity of the mj's. Section 3.3 describes the weighted estimator. 

The following additional notation is used. For any matrix A, define the 
norm \\A\\ = [trace {A' A)} 1 / 2 . Define U = Y-F\fi+m(X)], V(x) = Yax(U\X = 
x),Q K = E{F'[fi + m(X)] 2 P K (X)P K (Xy}, and * K = Q~ l E{F'[^ + m(X)] 2 V(X) x 
P K (X)P K (X)'}Q~ 1 whenever the latter quantity exists. Q K and ^f K are d{n) x 
d(n) positive semidefinite matrices, where d(n) = nd + 1. Let A Kjm i n denote 
the smallest eigenvalue of Q K . Let Q K ^ denote the element of Q K . 

Define Ck = sup^g^ ||P K (x)||. Let {9jk} be the coefficients of the series ex- 
pansion (2.3). For each k define 



#K — (Mi 011, • • • , 01k, 021, • • • , 02k, • • • ,0dl, • • • , Odn)'- 
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3.1. Assumptions. The main results are obtained under the following 
assumptions. 

Assumption Al. The data, {(Y;, Aj) :i = 1, ... , n}, are an i.i.d. random 
sample from the distribution of (Y,X), and E(Y|A = x) = F[fi + m{x)\ for 
almost every x G X = [— 1, 1] . 

Assumption A2. (i) The support of A" is Af . 

(ii) The distribution of A is absolutely continuous with respect to Lebesgue 
measure. 

(hi) The probability density function of X is bounded, bounded away 
from zero and twice continuously differentiable on X. 

(iv) There are constants cy > and Cy < oo such that cy < Vax(U\ X = 
x) < Cy for all x G X. 

(v) There is a constant Cy < oo such that B\U\ j < <7^~ 2 j!E([/ 2 ) < oo 
for all j>2. 

Assumption A3, (i) There is a constant C m < oo such that \rrij(v)\ < C m 
for each j = 1, . . . ,d and all v G [—1,1]. 

(ii) Each function rrij is twice continuously differentiable on [—1,1]. 

(hi) There are constants GVi < oo, cf2 > 0, and Cf2 < oo such that F(v) < Cpi 
and cp2 < F'(v) < Cf2 for all v G [// — C m ci, /i + C m d]. 

(iv) F is twice continuously differentiable on [/i — C m d, /U + C m d\. 

(v) There is a constant Cf3 < oo such that \F"(v2) — F"(v\)\ < Cf3\v2 — 
vi | for all V2,vi€\p- C m d, /x + C m d] . 

Assumption A4. (i) There are constants Cq < oo and ca > such that 
\Q K ,ij \ < Cq and A Kimin > c A for all k and all i,j = 1, . . . , 
(ii) The largest eigenvalue of is bounded for all k. 

Assumption A5. (i) The functions {pk} satisfy (2.1) and (2.2). 
(ii) There is a constant c K > such that Cn > c K for all sufficiently large k. 
(hi) Ck = 0(k}/ 2 ) as k — > oo. 

(iv) There are a constant < oo and vectors 6* K o G © K = [— Cg, Ce)] d( '^ 
such that sup xg _^ |/x + m(x) — P k {x)'O k q\ = 0(k~ 2 ) as k — > oo. 

(v) For each is an interior point of K . 

Assumption A6. (i) k = C K n^l x ^ JrV for some constant C K satisfying 
< C K < oo and some v satisfying < v < 1/30. 

(ii) h = Chn~ 1 /^ for some constant Ch satisfying < Ch < oo. 

Assumption A7. The function A is a bounded, continuous probability 
density function on [—1, 1] and is symmetric about 0. 



<s 



J. L. HOROWITZ AND E. MAMMEN 



The assumption that the support of X is [—1, l] d entails no loss of gen- 
erality as it can always be satisfied by carrying out monotone increasing 
transformations of the components of X , even if their support before trans- 
formation is unbounded. For practical computations, it suffices to transform 
the empirical support to [— 1,1] . Assumption A2 precludes the possibility 
of treating discrete covariates with our method, though they can be han- 
dled inelegantly by conditioning on them. Another possibility is to develop 
a version of our estimator for a partially linear generalized additive model in 
which discrete covariates are included in the parametric (linear) term. How- 
ever, this extension is beyond the scope of the present paper. Differentiability 
of the density of X [Assumption A2(iii)] is used to insure that the bias of our 
estimator converges to zero sufficiently rapidly. Assumption A2(v) restricts 
the thickness of the tails of the distribution of U and is used to prove consis- 
tency of the first-stage estimator. Assumption A3 defines the sense in which 
F and the m^'s must be smooth. Assumption A3(iii) is needed for iden- 
tification. Assumption A4 insures the existence and nonsingularity of the 
covariance matrix of the asymptotic form of the first-stage estimator. This 
is analogous to assuming that the information matrix is positive definite 
in parametric maximum likelihood estimation. Assumption A4(i) implies 
Assumption A4(ii) if U is homoskedastic. Assumptions A5(iii) and A5(iv) 
bound the magnitudes of the basis functions and insure that the errors in 
the series approximations to the rrij , s converge to zero sufficiently rapidly 
as k — ► oo. These assumptions are satisfied by spline and (for periodic func- 
tions) Fourier bases. Assumption A6 states the rates at which k — ► oo and 
h—>0 as n — > oo. The assumed rate of convergence of h is well known to be 
asymptotically optimal for one-dimensional kernel mean-regression when the 
conditional mean function is twice continuously differentiable. The required 
rate for k insures that the asymptotic bias and variance of the first-stage 
estimator are sufficiently small to achieve an n -2 / 5 rate of convergence in 
the second stage. The L2 rate of convergence of a series estimator of rrij 
is maximized by setting k oc n 1 / 5 , which is slower than the rates permitted 
by Assumption A6(i) [Newey (1997)]. Thus, Assumption A6(i) requires the 
first-stage estimator to be undersmoothed. Undersmoothing is needed to in- 
sure sufficiently rapid convergence of the bias of the first-stage estimator. 
We show that the first-order performance of our second-stage estimator does 
not depend on the choice of k if Assumption A6(i) is satisfied. See Theo- 
rems 2 and 3. Optimizing the choice of k would require a rather complicated 
higher-order theory and is beyond the scope of this paper, which is restricted 
to first-order asymptotics. 

3.2. Theorems. This section states two theorems that give the main re- 
sults of the paper. Theorem 1 gives the asymptotic behavior of the first-stage 
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series estimator under Assumptions Al-A6(i). Theorem 2 gives the proper- 
ties of the second-stage estimator. For i = 1, . . . ,n, define Ui = Yi — F[/i + 
m{Xi)\ and b K o(x) = fi + m(x) — P k (x)'8 k q. Let \\v\\ denote the Euclidean 
norm of any finite-dimensional vector v. 

Theorem 1. Let Assumptions Al-A6(i) hold. Then: 
(a) lim^oo \\9 nK - kO \\ = almost surely, 



(d) § nK -0 K o = n~ l Q~ l E?=i F'[fx + m(Xi)]P K (Xi)C/i x YJU F'\n + m(Xi)} 2 P K (Xi)b K (Xi) - 
R n , where \\R n \\ = O v {k z I 2 jn + n" 1 / 2 ). 



Now let fx denote the probability density function of X. For j = 0, 1, 
define 




n 



S' njl (x\m) = -2^{y, - F\p + mi (x l ) + m-x{Xi)]} 



i=l 



x F'lfi + mi(^) + m_i(Xi)](^ 



Also define 





2 / F'^ + m^x 1 ) + m„i(x)] 2 [9/x(x 1 ,x)/5x 1 ]d^ ) 






F"[/i + m^x 1 ) + ?n__i(x)]m' 1 (x 1 ) 
+ F'|/i + m^x 1 ) + m_i(x)]?n / /(x 1 ) 
2C^A^ J D (x 1 )- 1 



^(x 1 ) 




and 



BkC^D^x 1 )- 2 

x / Var(LT|x 1 ,x)F'[/^ + ?ni(x 1 ) +m_i(x)] 2 /x(x 1 ,x)<ix. 
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The next theorem gives the asymptotic properties of the second-stage 
estimator. 

Theorem 2. Let Assumptions A1-A6 hold. Then: 

(a) m 1 (x l )-m 1 {x 1 ) = [nhD (x 1 )}- 1 {-S' n01 {x\m) + [D 1 (x l )/D (x 1 )} x 
S' nll (x l , m)} + Op(ra -2 / 5 ) uniformly over \x l \ < 1 — h and rhi(x 1 ) — mi(x 1 ) = 
O p [(logn) 1 / 2 n~ 2 / 5 ] uniformly over jx 1 ) < 1. 

(b) n^miO 1 ) - m^x 1 )] 4 N^x 1 ), V^x 1 )}. 

(c) If j ^ 1, then n 2 / 5 [mi(x 1 ) — mi(x 1 )] and ra 2 / 5 [rhj (x J ') — mj{x^)\ are 
asymptotically independently normally distributed. 

Theorem 2(a) implies that asymptotically, n 2 / 5 [rrii(x 1 ) — mi(x 1 )] is not 
affected by random sampling errors in the first-stage estimator. In fact, 
the second-stage estimator of mi(x 1 ) has the same asymptotic distribution 
that it would have if 7712, . . . ,m<i were known and local-linear estimation 
were used to estimate m\(x l ) directly. In this sense, our estimator has an 
oracle property. Parts (b) and (c) of Theorem 2 imply that the estimators 
of mi(x 1 ), . . . : md(x d ) are asymptotically independently distributed. 

It is also possible to use a local-constant estimator in the second stage. 
The resulting second-stage estimator is 

rh^Lcix 1 ) = rhi(x 1 ) - S' n01 (x 1 ,m)/Sn 01 (x 1 ,rh). 

The following modification of Theorem 2, which we state without proof, 
gives the asymptotic properties of the local-constant second-stage estimator. 
Define 

g LC {x\x) = {d 2 /d( 2 ){F[m 1 (( + X 1 ) + m_i(x)] 

- Fim^x 1 ) +m- 1 (x)]}fx(C + x\x)\ c=0 

and 

(3i,Lc( xl )= 2 C 2 h A K D (x 1 y 1 

x J 9Lc(x 1 ,x)F'[fi + mi(x 1 ) +m_i(x)]f x (x 1 ,x)dx. 

Theorem 3. Let Assumptions A1-A6 hold. Then 

(a) rhi^Lcix 1 ) — m\{x l ) = — [nhDo(x 1 )]~ 1 S'^^x 1 , m) + o p (ra -2 / 5 ) uni- 
formly over \x 1 \ < 1 — h and rh\(x l ) — mi(x 1 ) = O p [(logra) 1//2 ra _2//5 ] uni- 
formly over jx 1 ) < 1. 

(b) n 2 / 5 [m hLC (x 1 )-m l (x 1 )}SN[P hLC (x 1 ),V l (x 1 )}. 
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(c) Ifj^l, then n 2 ^[rhi^Lc{x l ) — mi(x 1 )] andn 2 ^[rhj y Lc{ x ^)— r nj{ x ^)] 
are asymptotically independently normally distributed. 

V\{x l ) and (3\{x l ) and Pi,lc( x1 ) can be estimated consistently by replac- 
ing unknown population parameters with consistent estimators. Section 4 
gives a method for estimating the derivatives of mi that are in the expres- 
sions for /^(x 1 ) and (3i,lc( x1 )- As is usual in nonparametric estimation, rea- 
sonably precise bias estimation is possible only by making assumptions that 
amount to undersmoothing. One way of doing this is to assume that the sec- 
ond derivative of m\ satisfies a Lipschitz condition. Alternatively, one can set 

h = C h n~~< for 1/5<7<1. Then n^V^mi^ 1 ) - m^x 1 )] 4 N[0, Vi(x l % 

and n( 1 -T)/ 2 [mi >LC7 (x 1 ) - m^x 1 )] ^>iV[0, ^(x 1 )]. 

3.3. A weighted estimator. A weighted estimator can be obtained by 
replacing 5' ^(z ,m) and 5y[- 1 (x 1 ,m) in (2.5) with 

n 

Si ljl (x\m,w)=-2^w(x\X i ){Y i -F[fi + m 1 (x 1 )+m^i(X i )}} 
i=i 

x F'lfi + mi (x l ) + m^(Xi)](Xi - x^Khix 1 - X}) 

and 

S'njiix 1 ,m,w) 

n 

= 2j2^\X i )F'\fi + mi (x l ) + m_!(Xi)] 2 (^ - x^Khix 1 - X}) 
i=i 

n 

-2Y,w(x 1 ,X i ){Y i - F\fi + mi (x l ) + m_i(Xi)]} 
i=i 

x + ™i(^) + m_i(Xi)](Xi - x^'if^x 1 - X/) 

for j = 0, 1,2, where w is a nonnegative weight function that is assumed for 
the moment to be nonstochastic. It is convenient to normalize w so that 

J w(x 1 ,x)F'[n + mi(x 1 ) + m_i(x)] 2 /x(x 1 , x) dx = 1 

for each x 1 £ [—1, 1]. Arguments identical to those used to prove Theorem 2 
show that the variance of the asymptotic distribution of the resulting local- 
linear or local-constant estimator of m\{x l ) is 

V 1 (x 1 ,w) = 0.25B K C^ 1 J w(x 1 ,x) 2 Var(C/|x 1 ,x) 

x F'[n + mi(x 1 ) + rri-i(x)] 2 fx (x 1 , x) dx. 



12 



J. L. HOROWITZ AND E. MAMMEN 



It follows from Lemma 1 of Fan, Hardle and Mammen (1998) that V{x l , w) is 
minimized by setting w{x \x) 2 oc 1/ Var(C/|x 1 ,x), thereby yielding 

Vi(x\w) = 0.25B K C^ 1 D 2 (x 1 )- 1 J F'[p + m^x 1 ) + m-i(x)] 2 f x (x l ,x) dx, 

where 

D 2 (x l )= [ Yai(U\x 1 ,xy 1 F'[fi + m 1 (x 1 )+m^ 1 (x)] 2 f x (x 1 ,x)dx. 



In an application, it suffices to replace the variance-minimizing weight func- 
tion with a consistent estimator. For example, F'[u, + nii(x l ) + ?n_i(x)] can 
be estimated from the first estimation stage, Var(C/|x 1 ,x) can be estimated 
by applying a nonparametric regression to the squared residuals of the first- 
stage estimate and kernel methods can be used to estimate fx{x l ,x). 

The minimum-variance estimator is not a minimum asymptotic mean- 
square error estimator unless undersmoothing is used to remove the asymp- 
totic bias of rhx- This is because weighting affects the bias when the latter 
is nonnegligible. The weight function that minimizes the asymptotic mean- 
square error is the solution to an integral equation and does not have a 
closed-form analytic representation. 

4. Bandwidth selection. This section presents a plug-in and a penal- 
ized least squares (PLS) method for choosing h in applications. We begin 
with a description of the plug-in method. This method estimates the value 
of h that minimizes the asymptotic integrated mean-square error (AIMSE) 
of n 2 ^[rhi(x 1 ) — mi(x 1 )] for j = 1, ...,d. We discuss only local-linear esti- 
mation, but similar results hold for local-constant estimation. The AIMSE 
of n 2//5 (mi — mi) is defined as 

rl 

AIMSEi = n 4/5 y w{x 1 )[j3i{x 1 ) 2 + Vi{x 1 )]dx 1 , 

where w(-) is a nonnegative weight function that integrates to 1. We also 
define the integrated squared error (ISE) as 

rl 

ISEi = n 4 / 5 / w(x 1 )[rhi{x 1 )-mi{x 1 )] 2 dx 1 . 



We define the asymptotically optimal bandwidth for estimating mi as Chin 1//5 , 
where Chi minimizes AIMSEi. Let 

Pi(x 1 )=[3i(x 1 )/C 2 h and V^x 1 ) = C h Vi{x l ). 

Then 

■1 f^w^Y^x 1 ) dx 1 ]V5 



(4.1) Q 



hi 



4/i 1 y;(x 1 )/3i(x 1 ) 2 dx 1 
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The results for the plug- in method rely on the following two theorems. 
Theorem 4 shows that the difference between the ISE and AIMSE is asymp- 
totically negligible. Theorem 5 gives a method for estimating the first and 
second derivatives of m,j. Let denote the £th derivative of any £-times 
differentiable function G. 

Theorem 4. Let Assumptions A1-A6 hold. Then for a continuous weight 
function w(-) and as n— > oo, AIMSEi = ISEi +o p (l). 

Theorem 5. Let Assumptions A1-A6 hold. Let L be a twice differen- 
tiable probability density function on [—1, 1], and let {g n : n = 1, 2, . . . } be a 
sequence of strictly positive real numbers satisfying g n ^0 and cj^n 4 / 5 (log n) _1 
oo as n — > oo. For £ = 1,2 define 

m?V) = g- 1 - t £LW[(x 1 -v)/g n ]m 1 (v)dv. 
Then as n — > oo and for £=1,2, 

sup \m\ (x 1 ) — m{ (x 1 )] = o p (l). 

|x x |<i 

A plug-in estimator of Chi can now be obtained by replacing unknown 
population quantities on the right-hand side of (4.1) with consistent estima- 
tors. Theorem 5 provides consistent estimators of the required derivatives 
of mi. Estimators of the conditional variance of U and of fx can be obtained 
by using standard kernel methods. 

We now describe the PLS method. This method simultaneously estimates 
the bandwidths for second-stage estimation of all of the functions rrij (j = 
1, . . . , d). Let hj = Chju" 1 ^ be the bandwidth for rhj. Then the PLS method 
selects the C/y's that minimize an estimate of the average squared error 
(ASE), 

n 

ASE(h) = n- 1 ]T{F[A + rh(Xi)] - F[fi + m(JQ)]} 2 , 
i=i 

where h = (C^in -1 / 5 , . . . , C/^n -1 / 5 ). Specifically, the PLS method selects 
the C/y's to 

n 

minimize: PLS(7i) = rT 1 V {Y- - F\Ji + m( x i)}} 2 

Chl,---,Chd ■ i 

1=1 

n 

(4.2) + 2K(0)n- 1 + rh{Xi)] 2 V (Xi)} 

i=l 

xjti^ChjDjixir 1 , 
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where the C/y's are restricted to a compact, positive interval that excludes 0, 

1 n 

D i( xj ) = ^7 E K h* ( x l - ^ W + 

3 i=l 

and 

-l 



V(x) 



Y,K hl {X}-x l )---K hd {Xi 



x 



1=1 



£ Ptf " x 1 ) • • • K hd (X? - x^Yi - F[fi + m(Xi)}} 2 - 



x 
i=i 



The bandwidths used for V may be different from those used for to because 
V is a full-dimensional nonparametric estimator. We now argue that the 
difference 

n 

rT 1 J2 U ? + ASE(^) - PLS(h) 

i=l 

is asymptotically negligible and, therefore, that the solution to (4.2) es- 
timates the bandwidths that minimize ASE. A proof of this result only 
requires additional smoothness conditions on F and more restrictive as- 
sumptions on k. The proof can be carried out by making arguments similar 
to those used in the proof of Theorem 2 but with a higher-order stochastic 
expansion for to — m. Here, we provide only a heuristic outline. For this 
purpose, note that 



n 

i=l 



1 J2 U i + ASE(fc) - PLS(/i) 

n 

2rT x J2{F\ji + rn(Xi)} - F\pt + m(Xi)]}Ui 



i=l 

n d 

-2^(0)n- 1 ^F'[^ + m(A l )] 2 y(A l )^[n 4 / 5 C hj ^(X/)]- 1 . 

i=l 3=1 

We now approximate F\fi + rh(Xi)] — F\p + m{X,i)\ by a linear expansion 
in rh — m and replace rh — m with the stochastic approximation of Theo- 
rem 2(a). (A rigorous argument would require a higher-order expansion of 
rh — to.) Thus, F\fi + to(JQ)] — F\p + mpQ)] is approximated by a linear 
form in f/j. Dropping higher-order terms leads to an approximation of 

2 n 

- ^2{F[fi + to(X 4 )] - + m(Xi)]}Ui 

1=1 
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that is a U statistic in C/j. The off-diagonal terms of the U statistic can be 
shown to be of higher order and, therefore, asymptotically negligible. Thus, 
we get 

2 n 

- + ™(Xi)] - F[» + m(Xi)]}Ui 

1=1 

n n d 

.^F'[/, + ? n(X0] 2 Var(^|X t )E^ 4/5c '^^(^)]/i- 1 ^(0), 



n . 

i=l j=l 



where 



Ay Or*) = 2E{F'[/x + m(Xi)} 2 \Xf = x 3 }f XJ {x 3 



and fxi is the probability density function of X 3 . Now by standard ker- 
nel smoothing arguments, Doj(x :1 ) ~ Dj(x :1 ). In addition, it is clear that 
V(Xi) w V(Ui\Xi), which establishes the desired result. 

5. Monte Carlo experiments. This section presents the results of a small 
set of Monte Carlo experiments that compare the finite-sample performances 
of the two-stage estimator, the estimator of LH and the infeasible oracle es- 
timator in which all additive components but one are known. The oracle 
estimator cannot be used in applications but provides a benchmark against 
which our feasible estimator can be compared. The infeasible oracle estima- 
tor was calculated by solving (2.5). 

Experiments were carried out with d = 2 and d = 5. The sample size is 
n = 500. The experiments with d = 2 consist of estimating fx and fi in the 
binary logit model 

P(Y = \\X = x) = LlMx 1 ) + / 2 (x 2 )], 
where L is the cumulative logistic distribution function 

L{v) = e v /[l + e v ], -oo<u<oo. 

The experiments with d = 5 consist of estimating fx and fi in the binary 
logit model 



P(Y = l\X = x) = L 



fx(x 1 ) + f 2 (x 2 ) + J2 xj 

j=3 



In all of the experiments, 

fx(x) = sin(7rx) and f2(%) = $(3a;), 

where $ is the standard normal distribution function. The components of X 
are independently distributed as [/[—!,!]. Estimation is carried out under 
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the assumption that the additive components have two (but not necessarily 
more) continuous derivatives. Under this assumption, the two-stage estima- 
tor has the rate of convergence n~ 2 / 5 . The LH estimator has this rate of 
convergence if d = 2 but not if d = 5. 

i?-splines were used for the first stage of the two-stage estimator. The 
kernel used for the second stage and for the LH estimator is 

K(v) = f G (l-v 2 fl(\v\<l). 

Experiments were carried out using both local-constant and local-linear esti- 
mators in the second stage of the two-stage method. There were 1000 Monte 
Carlo replications per experiment with the two-stage estimator but only 
500 replications with the LH estimator because of the very long computing 
times it entails. The experiments were carried out in GAUSS using GAUSS 
random number generators. 

The results of the experiments are summarized in Table 1, which shows 
the empirical integrated mean-square errors (EIMSEs) of the estimators at 
the values of the tuning parameters that minimize the EIMSEs. Lengthy 
computing times precluded using data-based methods for selecting tuning 
parameters in the experiments. The EIMSEs of the local-constant and local- 
linear two-stage estimates of fi are considerably smaller than the EIMSEs 
of the LH estimator. The EIMSEs of the local-constant and LH estimators 
of /2 are approximately equal whereas the local-linear estimator of ji has 
a larger EIMSE. There is little difference between the EIMSEs of the two- 
stage local-linear and infeasible oracle estimators. This result is consistent 
with the oracle property of the two-stage estimator. 

6. Conclusions. This paper has described an estimator of the additive 
components of a nonparametric additive model with a known link function. 
The approach is very general and may be applicable to a wide variety of 
other models. The estimator is asymptotically normally distributed and has 
a pointwise rate of convergence in probability of ?i~ 2 / 5 when the unknown 
functions are twice continuously differentiable, regardless of the dimension 
of the explanatory variable X. In contrast, achieving the rate of conver- 
gence n -2//5 with the only other currently available estimator for this model 
requires the additive components to have an increasing number of deriva- 
tives as the dimension of X increases. In addition, the new estimator has 
an oracle property: the asymptotic distribution of the estimator of each ad- 
ditive component is the same as it would be if the other components were 
known. 

7. Proofs of theorems. Assumptions A1-A7 hold throughout this sec- 
tion. 
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Table 1 

Results of Monte Carlo experiments'" 



Empirical IMSE 



Estimator 



Kl 



K2 



d=2 



LH 

Two-stage with 
local-constant 
smoothing 

Two-stage with 
local-linear 
smoothing 

Infeasible oracle 
estimator 

LH 

Two-stage with 
local-constant 
smoothing 

Two-stage with 
local-linear 
smoothing 

Infeasible oracle 
estimator 



0.9 
0.4 



0.9 
0.9 



0.116 
0.052 



0.5 1.4 0.052 



0.6 1.7 0.056 

d = 5 

1.0 1.0 0.145 

0.4 0.9 0.060 



0.6 1.3 0.057 



0.6 2.0 0.057 



0.015 
0.015 



0.023 



0.021 



0.019 
0.018 



0.029 



0.023 



*In the two-stage estimator, Kj and hj (j = 1,2) are the series length and 
bandwidth used to estimate fj . In the LH estimator, hj (j = 1, 2) is the band- 
width used to estimate fj . The values of Ki , Ki , /ii and hi minimize the IMSEs 
of the estimates. 

7.1. Theorem 1. This section begins with lemmas that are used to prove 
Theorem 1. 

Lemma 1. There are constants a > and C < oo such that 



sup \S nK (9)-ES nk (6)\>e 
L0ee K 



< C exp(— nae 2 ) 



for any sufficiently small e > and all sufficiently large n. 
Proof. Write 

n 

S nK {6) = n- 1 J2 Y? ~ 2S nKl {9) + S nK2 (0), 

i=l 

where 

n 

S nftl (9) = n- 1 Y / Y l F[P K (X i y9] 
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s nK2 (e) = n- 1 Y,F[p K (x i ye} 2 . 

It suffices to prove that 



i=l 



sup \S nKj {9) - ES nkj {9)\ > e 



<Cexp(-nae 2 ) (j = l,2) 



for any e > 0, some C < oo and all sufficiently large n. The proof is given 
only for j = 1. Similar arguments apply when j = 2. 

Define S nK i(6) = S nK \{9) — ES nK i(#). Divide K into hypercubes of edge- 
length L Let e^, . . . , ei M) denote the M = (2C e /£) d ^ cubes thus created. 

( i) 

Let 6 K j be the point at the center of 0« . The maximum distance between 

# K j and any other point in 6^ is r = d(K) 1 / 2 £/2, and M = exp{(i(fi;)[log(C6>/r) + 
(1/2) log din)]}. Now 



sup |S nK i(0)| >e 



M r 



sup |5 nK i(0)| > £ 



6»ee 



Therefore, 



Pn = P 



Now for 0^Q ( i\ 



sup |5 nK i(0)| > e 

6»G6 K 



sup \S nK i{6) \ > e 



\SnKl(9)\ — \SnKl(0 K j) \ + \Snnl{0) ~ S nK i( 



i=l 



for all sufficiently large k and, therefore, n. Therefore, for all sufficiently 
large n, 



sup \S nKl (6)\ > e 



(j) 



<P[|S nKl (0 K ,)|> e /2]+P 



i=i 



Choose r = C" 2 . Then e/2 - C F2 C K r[C F1 + E(|Y|)] > e/4 for all sufficiently 
large k. Moreover, 
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<P C F2 C K rn- 1 J^dYi I " E|y |) > e/4 

i=l 

< 2exp{-a 1 ne 2 ( 2 ) 

for some constant a\ > and all sufficiently large k by Bernstein's inequality 
[Bosq (1998), page 22]. Also by Bernstein's inequality, there is a constant 
<i2 > such that 

P[\S nKl (6 Kj )\ > e/2] < 2exp(-a 2 ne 2 ) 
for all n, k and j. Therefore, 

P n < 2[Mexp(— dine 2 ) + exp(— aine 2 )] 

< 2exp{-a 2 ne 2 C^ + 2dC R n^[\og{C e /r) + \\og{2C K d) + \l\ogn}} 
+ 2exp(— a\ne 2 ), 

where 7 = 4/15 + v. It follows that P n < 4exp(— ane 2 ) for a suitable a > 
and all sufficiently large n. □ 

Define 

S K (9)=B[S nK (6)} 

and 

# K = argminS^). 



Lemma 2. For any n > 0, S K (9 nK ) — S K (8 K ) < rj almost surely for all 
sufficiently large n. 

Proof. For each k, let Af K C be an open set containing 6 K . Let M K 
denote the complement of M K in 6 K . Define T K =N K C\ Q K . Then T K C R d(ft) 
is compact. Define 

rj = mm S K (8) - S K (6 K ). 

0£T K 

Let A n be the event \S nK (6) - S re (0)| < i]/2 for all G K . Then 
and 

A n ^S nK (6 K )<S K (6 K )+ V /2. 
But S nK (9 nK ) < S nK (9 K ) by definition, so 

A n ^5 K (^ K )<5 nK (^) + 7 ? /2. 
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Therefore, 

A n S K {9 nK ) < S K {9 K ) + S K {6 nK ) - S K {9 K ) < n. 

So A n => 9 nK £ M K . Since N K is arbitrary, the result follows from Lemma 1 
and Theorem 1.3.4 of Serfling [(1980), page 10]. □ 

Define 

b k {x) =[i + m(x) - P K (x)'e K 

and 

s K0 (9) = E{y - F[p K (xye + b K (x)}} 2 . 

Then 

9 K = argminS K0 (#)- 

Lemma 3. For any n > 0, S k o(9k) — S K o{9 K o) < "<] for all sufficiently large n. 

Proof. Observe that \S K (9) — S K o(9)\ — > as n — > oo uniformly over 
9eQ K because > for almost every x <E X . For each k, let J\f K C M 

be an open set containing 9 k q. Define T K = M K fl K . Then T K C R d ( K ) is 
compact. Define 

n = min S k0 (9) - S k0 (9 k o)- 
9eT K 

By choosing a sufficiently small Af K , n can be made arbitrarily small. Choose n 
and, therefore, k large enough that \S K (9) — S k q(9) \ < n/2 for all 9 € 0. Now 
proceed as in the proof of Lemma 2. □ 

Define Z Ki = F'\j i + m(X i )]P K {X i ) and Q K = n" 1 £™ = i Z Ki Z' Kl . Then Q K = 
EQ K - Let [k = 1, . . . , denote the feth component of Z K i. Let Z K de- 
note the n x c?(k) matrix whose (i,k) element is Zz^. 

Lemma 4. \Q K — Q K || 2 = O p (K 2 /n). 

Proof. Let Qij denote the element of Q K . Then 

d(/c) d(/c) / n \ 2 

ehq„ -q k || 2 = EE e n ~*E ^4 - Qfcj 

fc=i j=i V i=i / 

= EE E -" 2 E E - 

fc=ij=i\ i=i e=i / 
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cL(k) cL(k) n d(n) cL(k) 

= EE En~ 2 £(Z* ) 2 (^) 2 - £ £ Ql, 
fe=i j=i i=i fe=i j=i 



< n _1 E 



g((k;) g((k) 

.fc=i j=i 



0(K/n). 



The lemma now follows from Markov's inequality. □ 

Define 7 n = I(X K m i n > c\/2), where / is the indicator function. Let U 

(u 1 ,...,u n y. 



Lemma 5. 'y n \\Q- 1 Z' K U/n\\ = O p {k 1 ' 2 /n 1 ' 2 ) as n 



oo. 



Proof. For any x € X, 

n- 2 V( ln \\Q- l ' 2 Z'Jj\\ 2 \X = x) = n- 2 ln E(U'Z K Q^Z K U\X = x) 

= n- 2 E[Trace(Z K Q- 1 ^[7c7 / )|^ = x] 

<n~ 2 ln C v Ti^{Q- l Z K Z' K ) 

= n~ l Cv^lnd{K) < Cn/n 

for some constant C < oo. Therefore, j n \\Q K 1/2 Z' K U/n\\ = O p {k x I 2 /n 1 / 2 ) by 
Markov's inequality. Now 

ln \\Q- 1 Z'Jj/n\\= ln [{U'Zjn)Q- 1 l 2 Q^ 

Define £ = Q K Z' K U jn. Let rji, . . . , t]d{n) an d <Zij ■ • ■ > IdU) denote the eigen- 
values and eigenvectors of Q^ 1 - Let T? max = max(7?i, . . . ,r)M K \). The spectral 
decomposition of Q~ 1 gives Q~ 1 = Y^i=x ViQtl'o so 

< 7n??max ^ < InVmaxt't = O p {n/n). 

1=1 U 

Define 

n 

B n = Q- 1 n- 1 ^F , [^ + m(X i )]^^o(^)- 

i=l 



Lemma 6. ||-B n || =0(k 2 ) with probability approaching 1 as n— >oo. 
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Proof. Let £ be the n x 1 vector whose ith component is F'[fi + 

miXiflbaiXi). Then B n = Q^Z'^/n, and 7 J#J 2 = n- 2 7 n£%Q« 2 ^- 
Therefore, by the same arguments used to prove Lemma 5, 7n ||-B n || 2 < 
Cn~ 1 7 n £'£ = 7„0(k~ 4 ). The lemma follows from the fact that P(7 n = 1) — > 1 
as n-> co. □ 

PROOF of Theorem 1. To prove part (a), write 

S^o^nfc) — S K o(6 K ) = [S K o(6 nK ) — S K (9 nK )] + [S K (9 nK ) — S K (9 K )] 

(7.1) 

+ [S K (9 K ) — S K o(9 K )] + [S K o(9 K ) — S K o(9 K )]. 

Given any 7] > 0, it follows from Lemmas 2 and 3 and uniform convergence 
of S K to S k q that each term on the right-hand side of (7.1) is less than r//4 
almost surely for all sufficiently large n. Therefore S K o(9 nK ) — S K o(9 K ) < r] 
almost surely for all sufficiently large n. It follows that \\9 nK — 6 K \\ — > almost 
surely asn->oo because 9 K uniquely minimizes S K . Part (a) follows because 
uniqueness of the series representation of each function m, implies that 
\\&k ~ 0«o|| - * as n — > oo. 

To prove the remaining parts of the theorem, observe that 9 nK satisfies 
the first-order condition dS nK (9 nK ) / 89 = almost surely for all sufficiently 
large n. Define M { = i i+m(X i ) and AM, = P K (Xi)'9 nK - M t = P K (Xi)'(9 nK - 
9 K o) — b K o(Xi). Then a Taylor series expansion yields 

n n 

n- 1 Z Ki Ui - (Q K + Rni){kn - Oko) + n- 1 £ F> \Mi)Z Ki b K0 (Xi) + i? n2 = 0, 

i=l i=l 

almost surely for all sufficiently large n. R n \ is defined by 

n 

Rm = n- 1 Y t {-U i F"{M i ) - Ui[F"{Mi) - F"(Mi)] 

i=l 

+ [\F" (Mi)F' (Mi) + \ F" (Mi ) F" (Mi ) AMi 

+ ^F" (Mi)F" (Mi)(AMi) 2 ]AMi 

- [F" (Mi)F' (Mi) - \F"(Mi)F'(Mi) 

+ F" (Mi ) F" (Mi )b K0 (Xi)] b K0 (Xi)} 
x P K (X t )P K (Xi)', 

where and M, are points between P K (Xi)'9 nK and Mj. R n 2 is defined by 

n _ 

Rn2 = -n- 1 J2{U i F"(Md + Ui[F"(Mi) - F"(Mi)} 

i=l 
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+ [F"(Mi)F'(Mi) - ±F"(MdF J (M i )]b K0 (X i ) 

-\F'\M^F'\M i )b M {X i ) 2 }P K {X i )b M {X i ). 

Now let £ denote either Q~ l Z'Jj/n or Q~ l [n~ l £f =1 F' (Mi) 2 P K (Xi) x b K0 (Xi) + 
Rn2\. Note that 



n 



- 1 Y,U i F"(M i )P K (X i )P K {X i )' 



i=l 



O p ( K 2 /n). 



Then 



= ln\\(Q. + Rniy 1 Rnlt\\ 2 

= Trace{7 n [£'i? n i((5 K + R n i)~ 2 R n iC}} 
= O p (||eXi|| 2 ) 

= O p (£0oJK 2 /n+ [ [P K (x)'(§ nK -e K o)] 2 dx + sup\b KO (x)\' 
= O p (£0O p (K 2 /n + k\\0 k - 6 k0 \\ 2 + ^" 3 ). 



nil 



Setting £ = Q K Z'JJ jn and applying Lemma 5 yields \\[(Q K + R 
Q- x \*Z'JJlnf = O p \&ln+{i?ln)\^ 
Ya=i F' (Mi) 2 P K (Xi)b K0 (Xi) + Rn2], then applying Lemma 6 and using the 
result HQ^-R^H =o p (k~ 2 ) yields 



[(Qk + Rui)- 1 -^ 1 ) 



O 



p\ \\ w riK 



n 

1=1 

3 KO \\ 2 /K + l/K b ). 



1 ^F / (M0Z Ki 6 KO (X i )+i?n2 



It follows from these results that 



1kg = n 



^Q- 1 J2 F '[» + m(X i )]P K (X l )U i 



i=l 



+ n^Q- 1 £ + m(X i )] 2 P K (X i )b K0 (X i ) + R n , 



i=l 



where \\R n \\ = O p (k?/ 2 /n + n -1 / 2 ). Part (d) of the theorem now follows from 
Lemma 4. Part (b) follows by applying Lemmas 5 and 6 to part (d). Part (c) 
follows from part (b) and Assumption A5(iii). □ 
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7.2. Theorem 2. This section begins with lemmas that are used to prove 
Theorem 2. For any x = (x 2 , . . . , x d ) € [— set m_i(i) = m2(i 2 ) + 
\-m d (x d ), and b K0 (x) =/i + m_i(£) - -P K (x)0 K o, where 

P«(z) = [1,0, . • .,0, Pl (x 2 ), . . . ,p K (x 2 ), . . . , Pl (x d ), . . . ,p K (x d )}' 

and 

K = (^,0, ... ,0,021, ■ • - ,02k, ...,9dl,... ,0,1k)' ■ 

In other words, P and k q are obtained by replacing . . . ,p K (x d ) with 

zeros in P re and On, . . . , #i K with zeros in k q. Also define 

n 

j'=i 

and 

n 

«y n2 (x ) = n" 1 ^^ ]T F'[/i + m(X,)] 2 P K (X J )6 K0 (^)- 

i=i 

For x 1 € [—1, 1] and for j = 0,1 define 



H njl {x 1 ) = (nh)- 1 l 2 Y^F'[^ + rn 1 {x 1 )+m^(X l )] 2 (X} -x 1 ) 
i=l 

xK h (x l -X})5 nl {Xi), 

n 

Hn^x 1 ) = {nhy^Y^F'^ + m^ + m-^X^iXl-x 1 ) 

i=l 

x K^x 1 - X})8 n2 (Xi) 



and 

H njZ {x l ) = -{nhr l / 2 Y J F'[^ + m l {x 1 ) + m^ l (X l )} 2 (X}-x l y 
i=i 

Let = Vax(U\X = x). 

Lemma 7. For j = 0, 1 and = 1,2,3, H n jk(x l ) = o p (l) as n^-oo uni- 
formly over x 1 £ [—1,1]. 

Proof. The proof is given only for j = 0. Similar arguments apply for 
j = 1. First consider // n oi(x 1 ). We can write 

n 

Hnoiix 1 ) = ^2a j (x 1 )U j , 

3=1 
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i=l 



■ n 



where 

aj (x l ) = n-'^h- 1 / 2 ]T F'[n + rmix 1 ) + m.^X,)} 2 

Kuix 1 - XDP^XiYQ^F'ifi + m(X j )]P K (X j ) 
- 3 / 2 / l - 1 /^^ ( ,i_ I i )A .. ( ,i). 

Define 

a(a; 1 )= y F'^ + m^x 1 ) +m_i(x)] 3 P«(x)/ x -(s 1 ,x)dx. 

Then arguments similar to those used to prove Lemma 1 show that a Ax ) = 
(/i/n) 1 / 2 [a(x 1 ) + r n ]'Q~ 1 F'[fi + m(Xj)]P K (Xj), where r n is uncorrelated with 
the C/j's and ||r n || = 0[(\ogn) / (nh) 1 ' 2 ] uniformly over x 1 £ [—1,1] almost 
surely. Moreover, for each x 1 £ [—1,1], the components of ^(x 1 ) are the 
Fourier coefficients of a function that is bounded uniformly over x . There- 
fore, 



(7.2) 



sup a(x x )'a(x x ) < M 

\x^<l 



for some finite constant M and all k = 1, . . . , oo. It follows from (7.2) and 
the Cauchy-Schwarz inequality that 

n 

a(x l )'(h/n) 1 / 2 J2 U 3 Q~ l F'[n + n,i Xy JUX, 



< M 



(h/n) 1 ' 2 J2 UjQ- 1 ^ + miXjllUX, 



But 



E 



(h/n) 1 ' 2 ]T U.Q^F'^ + miX^P^Xj) = 0(h), 

3=1 

so it follows from Markov's inequality that 

n 

«(^)'(V«) 1/2 E UjQ^F'lv + m{Xj)]P K {Xj) = O p (h 1 ' 2 ) 

3=1 

uniformly over x 1 € [—1,1]. This and ||r n || = 0[(logn) / (nh) 1 ' 2 ] establish the 
conclusion of the lemma for j = 0, k = 1. 
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We now prove the lemma for j = 0, k = 2. We can write 

n 

H nm {x l ) = (nhyV^F'lfi + miOr 1 ) + m^X^f K h {x l - X^P^Xi)' B n , 

i=l 

where 

n 

B n = n^Q-^F'ifi + miX^P^Xj^oiXj). 
i=i 

Arguments like those used to prove Lemma 6 show that E||£? n || 2 = 0(k~ 4 ). 
Therefore, 

n 

sup \H nQ2 (x 1 )\ = sup (nhy^^Khix 1 - X}) ■ O p {k~ 2 ) 

\x^<1 \x l \<l i = i 

= O p {n l / 2 h l ' 2 K- 2 ) 

= Op(l). 

For the proof with j = 0, /c = 3, note that 

n 

sup 1^03(^)1= sup (n^-^^^^i.x^-Op^- 2 ) 

|*i|<i |xi|<i i=1 

= o p (l). □ 

Let P K (x\,X) = [l^x 1 ), . . . ,p K {xl), Pl {X 2 ), . . .,p K (X 2 ), . . . , Pl {X d ), . . . ,p K (X d )}> 
and & K (x\ Xj) =/x + jti^x 1 ) +m_i(A > i ) - P^x 1 , Xi)0 KO . Define 

n 

<5 n3 (x 1 , X ) = n^P* (x 1 , Xi YQ" 1 F' [fx + m(X; )]P re ) 

and 

n 

5n4 (x 1 , X) = n^P, (x 1 , XO'Q" 1 X! F ' \M + m (^3 )] 2p « ( X 3 ) ft -o (X,- ) . 

3=1 

Also, for j = 0, 1 define 

n 

L njl (x v ) = (nhy 1 ' 2 UiF"[fi + m^x 1 ) + m_i(Xi)](X/ - x v y 
i=i 

n 

^(x 1 ) = (n/i)- 1 / 2 £ C/iP"[/i + ^(x 1 ) + m_i(Xi)](XJ - x 1 )^' 
i=l 
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n 

Lnjsix 1 ) = -{nhY 1 / 2 ]T UiF"\}i + m^x 1 ) + m.^X^X} - x l y 
t=l 

xKhtf-XDbrttftXi), 

n 

L nj 4(x l ) = (nh)- 1 / 2 + mi (X}) + m_i(Xi)] 

i=l 

-F^ + m^x^ + m-i^)]} 
x F"[^ + m 1 (a; 1 )+m_ 1 (A > i )](^ 1 - x 1 )^ 

t=i 

-Fl/j + mi^j + m.i^)]} 

and 

n 

W* 1 ) = -(n/i)" 1 / 2 + mi^ 1 ) + ro_iM] 

i=i 

- F [/i + mi (x 1 ) + m_ i ( Xi )] } 
x F" [fi + mi^ 1 ) + m_i(Xi)](^ " z 1 )' 
xiif h (x 1 -Jf i 1 )6«o(a; 1 ) Xi). 

Lemma 8. As n-> oo,L n jk(x ) = o p (l) uniformly over x 1 € [—1, 1] /or 
each j = 0,1, /c = l,...,6. 

Proof. The proof is given only for j = 0. The arguments are similar 
for j = 1. By Theorem 1, (^(x 1 , JQ) is the asymptotic bias component of 
the stochastic expansion of P K (x 1 ,Xi)(9 nK — 9 k q) and is O p (k~ 3 / 2 ) uniformly 
over [x l ,Xi) G [—1, l] d . This fact and standard bounds on 

n 

sup Y^UilKhix 1 - X}) 
k 1 l<i i =i 

and 

n 

sup Y^K^-X}) 

k' 1 l<ii=i 
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establish the conclusion of the lemma for j = 0, k = 2, 5. For j = 0, fc = 3, 6, 
proceed similarly using 

SUp \b K0 (x 1 )\=O(K- 2 ). 
\x' L \<l 

For j = 0, fc = 4, one can use arguments similar to those made for H u qi(x 1 ) 
in the proof of Lemma 7. It remains to consider L n Q\(x l ) = D n (x 1 )i? n , where 

n 

D n (x l ) = (nh)- 1 / 2 £ C^F'V + miOc 1 ) + m-i^)]^ 1 " X^P^x 1 ,^) 

i=l 

and 

n 
3=1 

Now, E||i? n || 2 = O^n" 1 ), and D n (x 1 ) contains elements of the form 

n 
i=l 

and 

n 

(?i/i)- 1 / 2 ^C/ lPr (Xf)F ,/ [ / u + mi(x 1 ) + ?n_ 1 (A > i )]^(x 1 -X/) 

i=l 

for 0<r<K, 2 < £ <d. These expressions can be bounded uniformly over 
| a; 1 1 < 1 by terms that are O p [(logn) 1/ ' 2 |p r (x 1 )j] and (^[(logn) 1 / 2 ], respec- 
tively. This gives 

sup H-D^z 1 )!! 2 = Op(Klogn). 

\x 1 \<l 

Therefore, 

sup l^oiC^ 1 )! 2 < sup \\D n {x 1 )\\ 2 \\B n \\ 2 = o p {l). 

I^l^l |x!|<l LI 

Lemma 9. The following hold uniformly over (x 1 ) < 1 — h: 
(nhy 1 S^ 01 (x 1 ,m) = D (x 1 ) + o p (l), 
(nh)- l S'^ 2l (x\m) = A K h 2 D Q (x l )[l + o p (l)] 

and 

{nh)- 1 SZ u (x\m) = ^AkD^x 1 )^ + o p (l)]. 
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Proof. This follows from Theorem 1(c) and standard bounds on 

n 

sup '£VT(Xl-x 1 )'K h (x 1 -Xl) 
l^' 1 l<ii=i 

for r = 0,1, s = 0,l,2. □ 

Define Ami (a; 1 ) = fhi(x 1 ) — rni(x l ) , Am_i(5) = p, — fjL+rh—i(x) —m—\{x) 
and Am(r,x) = Ami(x 1 ) + Am_i(5). 

Lemma 10. TTie following hold uniformly over \x l \ < 1 — fa: 

(a) (ra/i)- 1 / 2 ^^ 1 ,™) = (n/i)- 1 / 2 5; 01 (x 1 ,m) + (n/i) 1 / 2 J D (^ 1 )Ami(x 1 ) + 

°p(!)> 

(b) (nh)- 1 / 2 S' nll (x\m) = (nh)- 1 / 2 S' nll (x\m) + o p (l). 

Proof. Only (a) is proved. The proof of (b) is similar. For each i = 
l,...,n, let m*(x 1 ,Xi) and m**(x l , JQ) denote quantities that are between 
/i + mi(a; 1 ) + m_i(Xj) and /i + mi(x 1 ) +m_i(Xj). The values of m*(x 1 , Xj) 
and m**(x 1 ,Xj) may be different in different uses. A Taylor series expansion 
and Theorem 1(c) give 

(n/ l )-V2^ 1 (x\m) = K)- 1 /2^ 1 (x 1 ,m) + X;Jni^ 1 ) 

j'=i 

+ n(nh)~ 1/2 sup ||Am(x 1 ,x)|| 3 

-(x 1 ,i)GA' 

4 

= (n^)- 1 / 2 ^^ 1 , m) + J] ^(x 1 ) + op(l) 

i=i 

uniformly over (x 1 ) < 1 — /i, where 

n 

J n i(x 1 ) = 2(n/ l )- 1 / 2 ^F'[^ + m 1 ( 2; 1 ) + m_ 1 (A > l )] 2 A' h (x 1 -X J 1 )A7n 1 ( 3; 1 ), 
i=l 



J n2 (x x ) = 2(n/i)~ 1 / 2 ^ + rmix 1 ) + rn-i^)] 2 
i=l 

n 

Jnzix 1 ) = -2{nhY 1 / 2 ^{K, - F\n + m^x 1 ) + m_i(X*)]} 

i=l 

x F"K(x 1 ,l i p (l (x 1 -4)A m ( a; 1 ,I 1 ), 
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J ni (x r ) = 2{nh)- 1 ' 2 F'lM + miix 1 ) + m_i(Xi)] 



i=i 



x {F"[m*(x\X i )] + 2F"[m**(x 1 ,X i )}} 
x K^x 1 - X})[/S.m{x\Xi)\ 2 . 

It follows from Theorem 1(d) and Lemma 7 that J n .2{x 1 ) = 2 X)|=i HnVk{x l ) + 
o p (l) = Op(l) uniformly over jx 1 ! < 1 — /i. In addition, it follows from Theo- 
rem 1(c) that for some constant C < oo, 

n i- 

J ni {x l )<C{nh)- 1/2 Y,K h {x 1 -X}) sup || Am(x 1 ,i)|| 

L(x 1 ,£)eA' 



i=i 



Or 



o P (l) 



{nh) l l 2 sup ||Am(x\x)|| 2 

uniformly over jx 1 ) < 1 — h. Now consider J n z(x ). It follows from Assump- 
tion A3(v) that 



J^x 1 ) = -2(n/ i )^ 1 /2 ^ { y. _ F[At + mi(x i) + ^(Xi)]} 



i=l 



+ o z 



xF'V + rai^ + m-i^)] 
x^i'-l'jArati 1 ,!,) 

(n/i) 1//2 sup |Am(x 1 ,x)| 2 



^ ^(x 1 ) + O p (n/i) 1 / 2 sup |Am(x 1 ,x)| 2 



k=l 



+ Op(l) 



uniformly over |x* | < 1 — /i. Therefore, J n 3(^ 1 ) = °p(l) uniformly by Lemma 8 
and Theorem 1(c), and 

(nhr l l 2 S' nm (x\m) = {nh)- l l 2 S' nQl {x\m) + J nl {x l ) + 0p (l) 

uniformly over \x l \ < 1 — h. 
Now consider J„i(x 1 ). Set 

n 

J nl (x 1 ) = 2(nhr 1 / 2 Y / F'[^ + m 1 (x 1 )+m_ 1 (X i )] 2 K h (x 1 -X^. 
i=l 

It follows from Theorem 2.37 of Pollard (1984) that J^x 1 ) - ELJ^x 1 )] = 
o(logn) almost surely as n — > oo. In addition, ^[{nhy 1 / 2 J nl {x 1 )} = Z?(x x ) + 
<3(fr 2 ). Therefore, 

J nl (x l ) = (nh) 1/2 D (x 1 )Am 1 (x 1 ) + OflognAm^x 1 )] 
= (7i/i) 1 / 2 J D (s 1 )Am 1 (x 1 ) + op(l) 
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uniformly over \x l \ < 1 — h. □ 

PROOF of Theorem 2. By the definition of rhi{x l ), 
mi (a; 1 ) — m\{x l ) 
(7.3) = rhi(x 1 ) — mi(x 1 ) 

_ S'n21 ( xl , ™) S n01 ( xl , ™) ~ S nll . m ) S 'nll 0^ ; 

Part (a) follows by applying Lemmas 9 and 10 to the right-hand side of (7.3). 
Define 

w = [nhD {x 1 )]- 1 {-S , n01 (x 1 ,m) + [D^x 1 ) / D Q {x l )]S' nll (x\m)}. 

Methods identical to those used to establish asymptotic normality of local- 
linear estimators show that E(n 2 / 5 u;) = f3\ + o(l), Var(n 2//5 u;) = V\{x l ) + 
o(l) and n 2/ ' 5 [mi(x 1 ) — mi(x 1 )] is asymptotically normal, which proves part (b). 
□ 



Proof of Theorem 4. It follows from Theorem 2(a) that 



n 



4/5 / w(x 1 )\mi(x 1 )-m 1 (x 1 )] 2 dx 1 = oJl). 

'l-h<\x^\<l 



Now consider 



n 



4 / 5 / wix^m^x 1 ) - rmix 1 )] 2 dx 1 . 



By replacing the integrand with the expansion of Theorem 2(a), one obtains 
a [/-statistic in Ui conditional on X±, . . . ,X n . This [/-statistic has vanishing 
conditional variance. □ 



Proof of Theorem 5. Use Theorem 2(a) to replace rh\ with mi in 

the expression for m\ . The result now follows from standard methods for 
bounding kernel estimators. □ 
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